93 research outputs found

    Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

    Full text link
    Composed image retrieval aims to find an image that best matches a given multi-modal user query consisting of a reference image and text pair. Existing methods commonly pre-compute image embeddings over the entire corpus and compare these to a reference image embedding modified by the query text at test time. Such a pipeline is very efficient at test time since fast vector distances can be used to evaluate candidates, but modifying the reference image embedding guided only by a short textual description can be difficult, especially independent of potential candidates. An alternative approach is to allow interactions between the query and every possible candidate, i.e., reference-text-candidate triplets, and pick the best from the entire set. Though this approach is more discriminative, for large-scale datasets the computational cost is prohibitive since pre-computation of candidate embeddings is no longer possible. We propose to combine the merits of both schemes using a two-stage model. Our first stage adopts the conventional vector distancing metric and performs a fast pruning among candidates. Meanwhile, our second stage employs a dual-encoder architecture, which effectively attends to the input triplet of reference-text-candidate and re-ranks the candidates. Both stages utilize a vision-and-language pre-trained network, which has proven beneficial for various downstream tasks. Our method consistently outperforms state-of-the-art approaches on standard benchmarks for the task.Comment: 14 page

    Multi-sourced modelling for strip breakage using knowledge graph embeddings

    Get PDF
    Strip breakage is an undesired production failure in cold rolling. Typically, conventional studies focused on cause analyses, and existing data-driven approaches only rely on a single data source, resulting in a limited amount of information. Hence, we propose an approach for modelling breakage using multiple data sources. Many breakage-relevant features from multiple sources are identified and used, and these features are integrated using a breakage-centric ontology which is then used to create knowledge graphs. Through ontology construction and knowledge embedding, a real-world study using data from a cold-rolled strip manufacturer was conducted using the proposed approach

    Learning Effective NeRFs and SDFs Representations with 3D Generative Adversarial Networks for 3D Object Generation: Technical Report for ICCV 2023 OmniObject3D Challenge

    Full text link
    In this technical report, we present a solution for 3D object generation of ICCV 2023 OmniObject3D Challenge. In recent years, 3D object generation has made great process and achieved promising results, but it remains a challenging task due to the difficulty of generating complex, textured and high-fidelity results. To resolve this problem, we study learning effective NeRFs and SDFs representations with 3D Generative Adversarial Networks (GANs) for 3D object generation. Specifically, inspired by recent works, we use the efficient geometry-aware 3D GANs as the backbone incorporating with label embedding and color mapping, which enables to train the model on different taxonomies simultaneously. Then, through a decoder, we aggregate the resulting features to generate Neural Radiance Fields (NeRFs) based representations for rendering high-fidelity synthetic images. Meanwhile, we optimize Signed Distance Functions (SDFs) to effectively represent objects with 3D meshes. Besides, we observe that this model can be effectively trained with only a few images of each object from a variety of classes, instead of using a great number of images per object or training one model per class. With this pipeline, we can optimize an effective model for 3D object generation. This solution is one of the final top-3-place solutions in the ICCV 2023 OmniObject3D Challenge

    A multi-source feature-level fusion approach for predicting strip breakage in cold rolling

    Get PDF
    As an undesired and instantaneous failure in the production of cold-rolled strip products, strip breakage results in yield loss, reduced work speed and further equipment damage. Typically, studies have investigated this failure in a retrospective way focused on root cause analyses, and these causes are proven to be multi-faceted. In order to model the onset of this failure in a predictive manner, an integrated multi-source feature-level approach is proposed in this work. Firstly, by harnessing heterogeneous data across the breakage-relevant processes, blocks of data from different sources are collected to improve the breadth of breakage-centric information and are pre-processed according to its granularity. Afterwards, feature extraction or selection is applied to each block of data separately according to the domain knowledge. Matrices of selected features are concatenated in either flattened or expanded manner for comparison. Finally, fused features are used as inputs for strip breakage prediction using recurrent neural networks (RNNs). An experimental study using real-world data instantaneouseffectiveness of the proposed approach

    An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems

    Full text link
    The Segment Anything Model (SAM) has demonstrated exceptional performance and versatility, making it a promising tool for various related tasks. In this report, we explore the application of SAM in Weakly-Supervised Semantic Segmentation (WSSS). Particularly, we adapt SAM as the pseudo-label generation pipeline given only the image-level class labels. While we observed impressive results in most cases, we also identify certain limitations. Our study includes performance evaluations on PASCAL VOC and MS-COCO, where we achieved remarkable improvements over the latest state-of-the-art methods on both datasets. We anticipate that this report encourages further explorations of adopting SAM in WSSS, as well as wider real-world applications.Comment: Technique repor

    Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

    Full text link
    Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes. Existing approaches to solving this challenging task learn a mapping from the (reference image, modification text)-pair to an image embedding that is then matched against a large image corpus. One area that has not yet been explored is the reverse direction, which asks the question, what reference image when modified as describe by the text would produce the given target image? In this work we propose a bi-directional training scheme that leverages such reversed queries and can be applied to existing composed image retrieval architectures. To encode the bi-directional query we prepend a learnable token to the modification text that designates the direction of the query and then finetune the parameters of the text embedding module. We make no other changes to the network architecture. Experiments on two standard datasets show that our novel approach achieves improved performance over a baseline BLIP-based model that itself already achieves state-of-the-art performance.Comment: 12 pages, 5 figure

    Spin Excitation in Coupled Honeycomb Lattice Ni2_2InSbO6_6

    Full text link
    We performed an inelastic neutron scattering experiment on a polycrystalline sample of a helimagnet Ni2_2InSbO6_6 to construct the spin Hamiltonian. Well-defined spin-wave excitation with a band energy of 20 meV was observed below TN=76T_{N} = 76 K. Using the linear spin-wave theory, the spectrum was reasonably reproduced with honeycomb spin layers coupled along the stacking axis (the cc axis). The proposed spin model reproduces the soliton lattice induced by a magnetic field applied perpendicular to the cc axis.Comment: 8 pages, 5 figure

    Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning

    Full text link
    Machine Unlearning (MU) algorithms have become increasingly critical due to the imperative adherence to data privacy regulations. The primary objective of MU is to erase the influence of specific data samples on a given model without the need to retrain it from scratch. Accordingly, existing methods focus on maximizing user privacy protection. However, there are different degrees of privacy regulations for each real-world web-based application. Exploring the full spectrum of trade-offs between privacy, model utility, and runtime efficiency is critical for practical unlearning scenarios. Furthermore, designing the MU algorithm with simple control of the aforementioned trade-off is desirable but challenging due to the inherent complex interaction. To address the challenges, we present Controllable Machine Unlearning (ConMU), a novel framework designed to facilitate the calibration of MU. The ConMU framework contains three integral modules: an important data selection module that reconciles the runtime efficiency and model generalization, a progressive Gaussian mechanism module that balances privacy and model generalization, and an unlearning proxy that controls the trade-offs between privacy and runtime efficiency. Comprehensive experiments on various benchmark datasets have demonstrated the robust adaptability of our control mechanism and its superiority over established unlearning methods. ConMU explores the full spectrum of the Privacy-Utility-Efficiency trade-off and allows practitioners to account for different real-world regulations. Source code available at: https://github.com/guangyaodou/ConMU
    • …
    corecore